137 research outputs found

    A possibilistic approach to latent structure analysis for symmetric fuzzy data.

    Get PDF
    In many situations the available amount of data is huge and can be intractable. When the data set is single valued, latent structure models are recognized techniques, which provide a useful compression of the information. This is done by considering a regression model between observed and unobserved (latent) fuzzy variables. In this paper, an extension of latent structure analysis to deal with fuzzy data is proposed. Our extension follows the possibilistic approach, widely used both in the cluster and regression frameworks. In this case, the possibilistic approach involves the formulation of a latent structure analysis for fuzzy data by optimization. Specifically, a non-linear programming problem in which the fuzziness of the model is minimized is introduced. In order to show how our model works, the results of two applications are given.Latent structure analysis, symmetric fuzzy data set, possibilistic approach.

    A least squares approach to Principal Component Analysis for interval valued data

    Get PDF
    Principal Component Analysis (PCA) is a well known technique the aim of which is to synthesize huge amounts of numerical data by means of a low number of unobserved variables, called components. In this paper, an extension of PCA to deal with interval valued data is proposed. The method, called Midpoint Radius Principal Component Analysis (MR-PCA) recovers the underlying structure of interval valued data by using both the midpoints (or centers) and the radii (a measure of the interval width) information. In order to analyze how MR-PCA works, the results of a simulation study and two applications on chemical data are proposed.Principal Component Analysis, Least squares approach, Interval valued data, Chemical data

    Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review

    Get PDF
    Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory

    Fuzzy C-ordered medoids clustering of interval-valued data

    Get PDF
    Fuzzy clustering for interval-valued data helps us to find natural vague boundaries in such data. The Fuzzy c-Medoids Clustering (FcMdC) method is one of the most popular clustering methods based on a partitioning around medoids approach. However, one of the greatest disadvantages of this method is its sensitivity to the presence of outliers in data. This paper introduces a new robust fuzzy clustering method named Fuzzy c-Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). The Huber's M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators are used in the method proposed to make it robust to outliers. The described algorithm is compared with the fuzzy c-medoids method in the experiments performed on synthetic data with different types of outliers. A real application of the FcOMdC-ID is also provided

    A fuzzy taxonomy for e-Health projects

    Get PDF
    Evaluating the impact of Information Technology (IT) projects represents a problematic task for policy and decision makers aiming to define roadmaps based on previous experiences. Especially in the healthcare sector IT can support a wide range of processes and it is difficult to analyze in a comparative way the benefits and results of e-Health practices in order to define strategies and to assign priorities to potential investments. A first step towards the definition of an evaluation framework to compare e-Health initiatives consists in the definition of clusters of homogeneous projects that can be further analyzed through multiple case studies. However imprecision and subjectivity affect the classification of e-Health projects that are focused on multiple aspects of the complex healthcare system scenario. In this paper we apply a method, based on advanced cluster techniques and fuzzy theories, for validating a project taxonomy in the e-Health sector. An empirical test of the method has been performed over a set of European good practices in order to define a taxonomy for classifying e-Health projects.Evaluating the impact of Information Technology (IT) projects represents a problematic task for policy and decision makers aiming to define roadmaps based on previous experiences. Especially in the healthcare sector IT can support a wide range of processes and it is difficult to analyze in a comparative way the benefits and results of e-Health practices in order to define strategies and to assign priorities to potential investments. A first step towards the definition of an evaluation framework to compare e-Health initiatives consists in the definition of clusters of homogeneous projects that can be further analyzed through multiple case studies. However imprecision and subjectivity affect the classification of e-Health projects that are focused on multiple aspects of the complex healthcare system scenario. In this paper we apply a method, based on advanced cluster techniques and fuzzy theories, for validating a project taxonomy in the e-Health sector. An empirical test of the method has been performed over a set of European good practices in order to define a taxonomy for classifying e-Health projects.Articles published in or submitted to a Journal without IF refereed / of international relevanc

    Quantile-Based Fuzzy Clustering of Multivariate Time Series in the Frequency Domain

    Get PDF
    Financiado para publicaciĂłn en acceso aberto: Universidade da Coruña/CISUG[Abstract] A novel procedure to perform fuzzy clustering of multivariate time series generated from different dependence models is proposed. Different amounts of dissimilarity between the generating models or changes on the dynamic behaviours over time are some arguments justifying a fuzzy approach, where each series is associated to all the clusters with specific membership levels. Our procedure considers quantile-based cross-spectral features and consists of three stages: (i) each element is characterized by a vector of proper estimates of the quantile cross-spectral densities, (ii) principal component analysis is carried out to capture the main differences reducing the effects of the noise, and (iii) the squared Euclidean distance between the first retained principal components is used to perform clustering through the standard fuzzy C-means and fuzzy C-medoids algorithms. The performance of the proposed approach is evaluated in a broad simulation study where several types of generating processes are considered, including linear, nonlinear and dynamic conditional correlation models. Assessment is done in two different ways: by directly measuring the quality of the resulting fuzzy partition and by taking into account the ability of the technique to determine the overlapping nature of series located equidistant from well-defined clusters. The procedure is compared with the few alternatives suggested in the literature, substantially outperforming all of them whatever the underlying process and the evaluation scheme. Two specific applications involving air quality and financial databases illustrate the usefulness of our approach.The authors are grateful to the anonymous referees for their comments and suggestions. The research of Ángel LĂłpez-Oriona and JosĂ© A. Vilar has been supported by the Ministerio de EconomĂ­a y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de InvestigaciĂłn del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUGXunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

    A Bayesian network to analyse basketball players’ performances: a multivariate copula-based approach

    Get PDF
    Statistics in sports plays a key role in predicting winning strategies and providing objective performance indicators. Despite the growing interest in recent years in using statistical methodologies in this field, less emphasis has been given to the multivariate approach. This work aims at using the Bayesian networks to model the joint distribution of a set of indicators of players’ performances in basketball in order to discover the set of their probabilistic relationships as well as the main determinants affecting the player’s winning percentage. From a methodological point of view, the interest is to define a suitable model for non-Gaussian data, relaxing the strong assumption on normal distribution in favour of Gaussian copula. Through the estimated Bayesian network, we discovered many interesting dependence relationships, providing a scientific validation of some known results mainly based on experience. At last, some scenarios of interest have been simulated to understand the main determinants that contribute to rising in the number of won games by a player

    Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences

    Get PDF
    Financiado para publicaciĂłn en acceso aberto: Universidade da Coruña/CISUG.[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's Îș. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14The research of Ángel LĂłpez-Oriona and JosĂ© A. Vilar has been supported by the Ministerio de EconomĂ­a y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de InvestigaciĂłn del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG. The author Ángel LĂłpez-Oriona is very grateful to researcher Maite Freire for her lessons about DNA theory

    Fuzzy clustering of spatial interval-valued data

    Get PDF
    In this paper, two fuzzy clustering methods for spatial intervalvalued data are proposed, i.e. the fuzzy C-Medoids clustering of spatial interval-valued data with and without entropy regularization. Both methods are based on the Partitioning Around Medoids (PAM) algorithm, inheriting the great advantage of obtaining non-fictitious representative units for each cluster. In both methods, the units are endowed with a relation of contiguity, represented by a symmetric binary matrix. This can be intended both as contiguity in a physical space and as a more abstract notion of contiguity. The performances of the methods are proved by simulation, testing the methods with different contiguity matrices associated to natural clusters of units. In order to show the effectiveness of the methods in empirical studies, three applications are presented: the clustering of municipalities based on interval-valued pollutants levels, the clustering of European fact-checkers based on interval-valued data on the average number of impressions received by their tweets and the clustering of the residential zones of the city of Rome based on the interval of price values
    • 

    corecore